Named Entity Recognition for Telugu News Articles using Naïve Bayes Classifier

نویسندگان

  • SaiKiranmai Gorla
  • Sriharshitha Velivelli
  • N. L. Bhanu Murthy
  • Aruna Malapati
چکیده

The Named Entity Recognition (NER) is identifying name of Person, Location, Organization etc. in a given sentence or a document. In this paper, we have attempted to classify textual content from on-line Telugu newspapers using well known generative model. We have used generic features like contextual words and their part-of-speech (POS) to build the learning model. By understanding the syntax and grammar of Telugu language, we propose morphological pre-processing of the data and this step yields us better accuracy. We propose some interesting language dependent features like post-position feature, clue word feature and gazetteer feature to improve the performance of the model. The model achieved an overall average F1-Score of 88.87% for Person, 87.32% for Location and 72.69% for Organization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mapping Arabic Wikipedia into the Named Entities Taxonomy

This paper describes a comprehensive set of experiments conducted in order to classify Arabic Wikipedia articles into predefined sets of Named Entity classes. We tackle using four different classifiers, namely: Naïve Bayes, Multinomial Naïve Bayes, Support Vector Machines, and Stochastic Gradient Descent. We report on several aspects related to classification models in the sense of feature repr...

متن کامل

NELIS - Named Entity and Language Identification System: Shared Task System Description

This paper proposes a simple and elegant solution for language identification and named entity (NE) recognition at a word level, as a part of Subtask-1: Query Word Labeling of FIRE 2015. Given any query q1:w1 w2 w3 ... wn in Roman script, the task calls for labeling words of the query as English (En) or a member of L, where L = {Bengali (Bn), Gujarati (Gu), Hindi (Hi), Kannada (Kn), Malayalam (...

متن کامل

Exploiting multilingual wikipedia to improve arabic named entity resources

This paper focuses on the creation of Arabic named entity gazetteers, by exploiting Wikipedia and using the Naïve Bayes classifier to classify the named entities into the three main categories: person, location, and organization. The process of building the gazetteer starts with automatically creating the datasets. The dataset for the training is constructed using only Arabic text, whereas, the...

متن کامل

Named Entity Recognition in Crime News Documents Using Classifiers Combination

The increasing volume of generated crime information readily available on the web makes the process of retrieving and analyzing and use of the valuable information in such texts manually a very difficult task. This work is focus on designing models for extracting crime-specific information from the Web. Thus, this paper proposes an ensemble framework for crime named entity recognition task. The...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018